NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MeshAgent: Enabling Reliable Network Management with Large Language Models

https://doi.org/10.1145/3771567

Zhou, Yajie; Hsieh, Kevin; Mani, Sathiya Kumaran; Kandula, Srikanth; Liu, Zaoxing (December 2025, ACM SIGMETRICS)

Full Text Available
NetVigil: Robust and Low-Cost Anomaly Detection for East-West Data Center Security

Hsieh, Kevin; Wong, Mike; Segarra, Santiago; Mani, Sathiya Kumaran; Eberl, Trevor; Panasyuk, Anatoliy; Netravali, Ravi; Chandra, Ranveer; Kandula, Srikanth (April 2024, 21st USENIX Symposium on Networked Systems Design and Implementation)

Full Text Available
Federated Learning under Distributed Concept Drift

Jothimurugesan, Ellango; Hsieh, Kevin; Wang, Jianyu; Joshi, Gauri; Gibbons Phillip B. (April 2023, Artificial Intelligence and Statistics Conference (AISTATS))

Federated Learning (FL) under distributed concept drift is a largely unexplored area. Although concept drift is itself a well-studied phenomenon, it poses particular challenges for FL, because drifts arise staggered in time and space (across clients). Our work is the first to explicitly study data heterogeneity in both dimensions. We first demonstrate that prior solutions to drift adaptation, with their single global model, are ill-suited to staggered drifts, necessitating multiple-model solutions. We identify the problem of drift adaptation as a time-varying clustering problem, and we propose two new clustering algorithms for reacting to drifts based on local drift detection and hierarchical clustering. Empirical evaluation shows that our solutions achieve significantly higher accuracy than existing baselines, and are comparable to an idealized algorithm with oracle knowledge of the ground-truth clustering of clients to concepts at each time step.
more » « less
Full Text Available
Federated Learning under Distributed Concept Drift

Jothimurugesan, Ellango; Hsieh, Kevin; Wang, Jianyu; Joshi, Gauri; Gibbons, Phillip B. (January 2023, Proceedings of Machine Learning Research)

Federated Learning (FL) under distributed concept drift is a largely unexplored area. Although concept drift is itself a well-studied phenomenon, it poses particular challenges for FL, because drifts arise staggered in time and space (across clients). Our work is the first to explicitly study data heterogeneity in both dimensions. We first demonstrate that prior solutions to drift adaptation, with their single global model, are ill-suited to staggered drifts, necessitating multiple-model solutions. We identify the problem of drift adaptation as a time-varying clustering problem, and we propose two new clustering algorithms for reacting to drifts based on local drift detection and hierarchical clustering. Empirical evaluation shows that our solutions achieve significantly higher accuracy than existing baselines, and are comparable to an idealized algorithm with oracle knowledge of the ground-truth clustering of clients to concepts at each time step.
more » « less
Full Text Available
RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics

Khani, Mehrdad; Ananthanarayanan, Ganesh; Hsieh, Kevin; Jiang, Junchen; Netravali, Ravi; Shu, Yuanchao; Alizadeh, Mohammad; Bahl, Victor (April 2023, 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))

Full Text Available
RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics

Khani, Mehrdad; Ananthanarayanan, Ganesh; Hsieh, Kevin; Jiang, Junchen; Netravali, Ravi; Shu, Yuanchao; Alizadeh, Mohammad; Bahl, Victor (April 2023, USENIX Association)

Full Text Available
RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics

Khani, Mehrdad; Ananthanarayan, Ganesh; Hsieh, Kevin; Jiang, Junchen; Netravali, Ravi; Shu, Yuanchao; Alizadeh, Mohammad; Bahl, Victor (April 2023, 20th USENIX Symposium on Networked Systems Design and Implementation)

Full Text Available
Towards a Cost vs. Quality Sweet Spot for Monitoring Networks

https://doi.org/10.1145/3484266.3487390

Yaseen, Nofel; Arzani, Behnaz; Chintalapudi, Krishna; Ranganathan, Vaishnavi; Frujeri, Felipe; Hsieh, Kevin; Berger, Daniel S.; Liu, Vincent; Kandula, Srikanth (November 2021, Proceedings of the Twentieth ACM Workshop on Hot Topics in Networks)

Full Text Available
The Non-IID Data Quagmire of Decentralized Machine Learning

Hsieh, Kevin; Phanishayee, Amar; Mutlu, Onur; Gibbons, Phillip B. (July 2020, 37th International Conference on Machine Learning, ICML 2020)

Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.
more » « less
Full Text Available
Focus: Querying Large Video Datasets with Low Latency and Low Cost

Hsieh, Kevin; Ananthanarayanan, Ganesh; Bodík, Peter; Venkataraman, Shivaram; Bahl, Paramvir; Philipose, Matthai; Gibbons, Phillip B; Mutlu, Onur (October 2018, 13th USENIX Symposium on Operating Systems Design and Implementation)

Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering “after the fact” queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. Current systems for processing such queries on large video datasets incur either high cost at video ingest time or high latency at query time. We present Focus, a system providing both low-cost and low-latency querying on large video datasets. Focus’s architecture flexibly and effectively divides the query processing work between ingest time and query time. At ingest time (on live videos), Focus uses cheap convolutional network classifiers (CNNs) to construct an approximate index of all possible object classes in each frame (to handle queries for any class in the future). At query time, Focus leverages this approximate index to provide low latency, but compensates for the lower accuracy of the cheap CNNs through the judicious use of an expensive CNN. Experiments on commercial video streams show that Focus is 48× (up to 92×) cheaper than using expensive CNNs for ingestion, and provides 125× (up to 607×) lower query latency than a state-of-the-art video querying system (NoScope).
more » « less
Full Text Available

« Prev Next »

Search for: All records